Modeling data using directional distributions: Part II
نویسندگان
چکیده
High-dimensional data is central to most data mining applications, and only recently has it been modeled via directional distributions. In [Banerjee et al., 2003] the authors introduced the use of the von Mises-Fisher (vMF) distribution for modeling high-dimensional directional data, particularly for text and gene expression analysis. The vMF distribution is one of the simplest directional distributions. The Watson, Bingham, and Fisher-Bingham distributions provide distributions with an increasing number of parameters and thereby commensurately increased modeling power. This report provides a followup study to the initial development in [Banerjee et al., 2003] by presenting Expectation Maximization (EM) procedures for estimating parameters of a mixture of Watson (moW) distributions. The numerical challenges associated with parameter estimation for both of these distributions are significantly more difficult than for the vMF distribution. We develop new numerical approximations for estimating the parameters permitting us to model reallife data more accurately. Our experimental results establish that for certain data sets improved modeling power translates into better results.
منابع مشابه
Modeling Data using Directional Distributions
Traditionally multi-variate normal distributions have been the staple of data modeling in most domains. For some domains, the model they provide is either inadequate or incorrect because of the disregard for the directional components of the data. We present a generative model for data that is suitable for modeling directional data (as can arise in text and gene expression clustering). We use m...
متن کاملBayesian Modeling of Directional Data with Acoustic and Other Applications
A direction is defined here as a multi-dimensional unit vector. Such unit vectors form directional data. Closely related to directional data are axial data for which each direction is equivalent to the opposite direction. Directional data and axial data arise in various fields of science. In probabilistic modeling of such data, probability distributions are needed which count for the structure ...
متن کاملUsing Weighted Distributions for Modeling Skewed, Multimodal and Truncated Data
When the observations reflect a multimodal, asymmetric or truncated construction or a combination of them, using usual unimodal and symmetric distributions leads to misleading results. Therefore, distributions with ability of modeling skewness, multimodality and truncation have been in the core of interest in statistical literature, always. There are different methods to contract ...
متن کاملNew families of wrapped distributions for modeling skew circular data
Tomasz J. Kozubowski Department of Mathematics, University of Nevada, Reno Abstract: We discuss circular distributions obtained by wrapping the classical exponential and Laplace distributions on the real line around the circle. We present explicit forms for their densities and distribution functions, as well as their trigonometric moments and related parameters, and discuss main properties of t...
متن کاملModeling Static Bruising in Apple Fruits: A Comparative Study, Part II: Finite Element Approach
ABSTRACT- Mechanical damage degrades fruit quality in the chain from production to the consumption. Damage is due to static, impact and vibration loads during processes such as harvesting, transportation, sorting and bulk storage. In the present study finite element (FE) models were used to simulate the process of static bruising for apple fruits by contact of the fruit with a hard surface. Thr...
متن کامل